Overview

Dataset Statistics

Number of Variables 20
Number of Rows 11922
Missing Cells 27132
Missing Cells (%) 11.4%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 14.1 MB
Average Row Size in Memory 1.2 KB
Variable Types
  • Numerical: 10
  • Categorical: 10

Dataset Insights

neighbourhood_group has 11922 (100.0%) missing values Missing
monthly_price has 10879 (91.25%) missing values Missing
last_review has 2160 (18.12%) missing values Missing
reviews_per_month has 2163 (18.14%) missing values Missing
host_id is skewed Skewed
latitude is skewed Skewed
longitude is skewed Skewed
price is skewed Skewed
minimum_nights is skewed Skewed
number_of_reviews is skewed Skewed
reviews_per_month is skewed Skewed
calculated_host_listings_count is skewed Skewed
availability_365 is skewed Skewed
name has a high cardinality: 11780 distinct values High Cardinality
host_name has a high cardinality: 2980 distinct values High Cardinality
neighbourhood has a high cardinality: 103 distinct values High Cardinality
amenities has a high cardinality: 11130 distinct values High Cardinality
monthly_price has a high cardinality: 329 distinct values High Cardinality
last_review has a high cardinality: 890 distinct values High Cardinality
neighbourhood_group has all distinct values Unique
longitude has 11922 (100.0%) negatives Negatives
number_of_reviews has 2153 (18.06%) zeros Zeros
availability_365 has 2356 (19.76%) zeros Zeros
  • 1
  • 2
  • 3

Variables

id

numerical

Approximate Distinct Count 11922
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 1.7286e+07
Minimum 6
Maximum 3.005e+07
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is skewed left (γ1 = -0.3959)

Quantile Statistics

Minimum 6
5-th Percentile 2.1725e+06
Q1 1.1595e+07
Median 1.8599e+07
Q3 2.4037e+07
95-th Percentile 2.8763e+07
Maximum 3.005e+07
Range 3.005e+07
IQR 1.2443e+07

Descriptive Statistics

Mean 1.7286e+07
Standard Deviation 8.2054e+06
Variance 6.7329e+13
Sum 2.0608e+11
Skewness -0.3959
Kurtosis -0.8903
Coefficient of Variation 0.4747

name

categorical

Approximate Distinct Count 11780
Approximate Unique (%) 98.8%
Missing 2
Missing (%) 0.0%
Memory Size 1.2 MB

Length

Mean 37.8598
Standard Deviation 12.8851
Median 38
Minimum 1
Maximum 255

Sample

1st row Large Craftsmen w/...
2nd row Ocean front condo ...
3rd row Sunset Cliffs Stud...
4th row Art Studio Retreat...
5th row OB cottage SD--vie...

Letter

Count 364208
Lowercase Letter 294739
Space Separator 64938
Uppercase Letter 69469
Dash Punctuation 2173
Decimal Number 6794
  • name contains many words: 5160 words
  • The largest value (beach) is over 1.58 times larger than the second largest value (private)

host_id

numerical

Approximate Distinct Count 7095
Approximate Unique (%) 59.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 6.0858e+07
Minimum 29
Maximum 2.2518e+08
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • host_id is skewed right (γ1 = 1.0099)

Quantile Statistics

Minimum 29
5-th Percentile 1.7584e+06
Q1 1.2243e+07
Median 3.8691e+07
Q3 9.5051e+07
95-th Percentile 1.868e+08
Maximum 2.2518e+08
Range 2.2518e+08
IQR 8.2808e+07

Descriptive Statistics

Mean 6.0858e+07
Standard Deviation 5.8505e+07
Variance 3.4228e+15
Sum 7.2555e+11
Skewness 1.0099
Kurtosis 0.00093387
Coefficient of Variation 0.9613
  • host_id is not normally distributed (p-value 5.940363296357599e-11)
  • host_id has 77 outliers

host_name

categorical

Approximate Distinct Count 2980
Approximate Unique (%) 25.0%
Missing 6
Missing (%) 0.1%
Memory Size 836.8 KB

Length

Mean 6.8082
Standard Deviation 4.0658
Median 6
Minimum 1
Maximum 34

Sample

1st row Sara
2nd row Jef Karchin'S MISS...
3rd row Marin
4th row Chris And Jean
5th row Melissa

Letter

Count 76897
Lowercase Letter 62194
Space Separator 3029
Uppercase Letter 14703
Dash Punctuation 27
Decimal Number 427
  • host_name contains many words: 2721 words

neighbourhood_group

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 791.7 KB

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan

Letter

Count 35766
Lowercase Letter 35766
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • neighbourhood_group has words of constant length

neighbourhood

categorical

Approximate Distinct Count 103
Approximate Unique (%) 0.9%
Missing 0
Missing (%) 0.0%
Memory Size 887.4 KB

Length

Mean 11.2237
Standard Deviation 3.0927
Median 11
Minimum 4
Maximum 27

Sample

1st row North Hills
2nd row Mission Bay
3rd row Ocean Beach
4th row North Hills
5th row Loma Portal

Letter

Count 122896
Lowercase Letter 100061
Space Separator 10890
Uppercase Letter 22835
Dash Punctuation 23
Decimal Number 0

city

categorical

Approximate Distinct Count 47
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 861.9 KB
  • The largest value (San Diego) is over 38.82 times larger than the second largest value (La Jolla)

Length

Mean 9.0231
Standard Deviation 0.5985
Median 9
Minimum 2
Maximum 28

Sample

1st row San Diego
2nd row San Diego
3rd row San Diego
4th row San Diego
5th row San Diego

Letter

Count 95615
Lowercase Letter 71767
Space Separator 11937
Uppercase Letter 23848
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (San Diego, La Jolla) take over 50.0%

latitude

numerical

Approximate Distinct Count 11919
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 32.7703
Minimum 32.5326
Maximum 33.0861
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • latitude is skewed right (γ1 = 0.8713)

Quantile Statistics

Minimum 32.5326
5-th Percentile 32.7066
Q1 32.7265
Median 32.7595
Q3 32.7996
95-th Percentile 32.9174
Maximum 33.0861
Range 0.5535
IQR 0.07307

Descriptive Statistics

Mean 32.7703
Standard Deviation 0.06564
Variance 0.004308
Sum 390687.4177
Skewness 0.8713
Kurtosis 2.3387
Coefficient of Variation 0.002003
  • latitude is not normally distributed (p-value 1.7329451017571215e-07)
  • latitude has 755 outliers

longitude

numerical

Approximate Distinct Count 11857
Approximate Unique (%) 99.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean -117.1819
Minimum -117.2814
Maximum -116.9335
Zeros 0
Zeros (%) 0.0%
Negatives 11922
Negatives (%) 100.0%
  • longitude is skewed right (γ1 = 0.5695)

Quantile Statistics

Minimum -117.2814
5-th Percentile -117.2596
Q1 -117.2457
Median -117.1685
Q3 -117.1412
95-th Percentile -117.0664
Maximum -116.9335
Range 0.3479
IQR 0.1045

Descriptive Statistics

Mean -117.1819
Standard Deviation 0.06433
Variance 0.004139
Sum -1.397e+06
Skewness 0.5695
Kurtosis 0.08424
Coefficient of Variation -0.00054901
  • longitude is not normally distributed (p-value 5.550910303445427e-10)
  • longitude has 91 outliers

property_type

categorical

Approximate Distinct Count 33
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 847.8 KB

Length

Mean 7.8229
Standard Deviation 2.6128
Median 9
Minimum 3
Maximum 29

Sample

1st row House
2nd row Condominium
3rd row Guesthouse
4th row Tiny house
5th row House

Letter

Count 92608
Lowercase Letter 80578
Space Separator 597
Uppercase Letter 12030
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (House, Apartment) take over 50.0%

room_type

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 920.6 KB
  • The largest value (Entire home/apt) is over 2.42 times larger than the second largest value (Private room)

Length

Mean 14.0735
Standard Deviation 1.4092
Median 15
Minimum 11
Maximum 15

Sample

1st row Entire home/apt
2nd row Entire home/apt
3rd row Entire home/apt
4th row Entire home/apt
5th row Entire home/apt

Letter

Count 147558
Lowercase Letter 135636
Space Separator 11922
Uppercase Letter 11922
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Entire home/apt, Private room) take over 50.0%

amenities

categorical

Approximate Distinct Count 11130
Approximate Unique (%) 93.4%
Missing 0
Missing (%) 0.0%
Memory Size 7.2 MB
  • The largest value ({}) is over 1.59 times larger than the second largest value ({TV,"Cable TV",Wifi,"Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,Elevator,"Free street parking","Hot tub",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Private living room","Hot water","Bed linens",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"BBQ grill","Patio or balcony","Long term stays allowed",Other})

Length

Mean 384.1409
Standard Deviation 195.8687
Median 342
Minimum 2
Maximum 1613

Sample

1st row {TV,Internet,Wifi,...
2nd row {TV,"Cable TV",Int...
3rd row {Internet,Wifi,Kit...
4th row {Internet,Wifi,Poo...
5th row {TV,Internet,Wifi,...

Letter

Count 3576213
Lowercase Letter 3239501
Space Separator 277142
Uppercase Letter 336712
Dash Punctuation 14342
Decimal Number 9796
  • amenities contains many words: 3200 words

price

numerical

Approximate Distinct Count 717
Approximate Unique (%) 6.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 212.7079
Minimum 0
Maximum 10000
Zeros 2
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • price is skewed right (γ1 = 11.143)

Quantile Statistics

Minimum 0
5-th Percentile 45
Q1 80
Median 130
Q3 249
95-th Percentile 600
Maximum 10000
Range 10000
IQR 169

Descriptive Statistics

Mean 212.7079
Standard Deviation 306.9918
Variance 94243.9374
Sum 2.5359e+06
Skewness 11.143
Kurtosis 245.3259
Coefficient of Variation 1.4433
  • price is not normally distributed (p-value 2.6158690087591736e-23)
  • price has 829 outliers

monthly_price

categorical

Approximate Distinct Count 329
Approximate Unique (%) 31.5%
Missing 10879
Missing (%) 91.2%
Memory Size 76.3 KB

Length

Mean 9.8658
Standard Deviation 0.5706
Median 10
Minimum 8
Maximum 11

Sample

1st row $2,150.00
2nd row $2,500.00
3rd row $1,450.00
4th row $5,500.00
5th row $1,050.00

Letter

Count 0
Lowercase Letter 0
Space Separator 1043
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 6201

minimum_nights

numerical

Approximate Distinct Count 50
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 4.6532
Minimum 1
Maximum 500
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • minimum_nights is skewed right (γ1 = 15.213)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 2
Q3 3
95-th Percentile 30
Maximum 500
Range 499
IQR 2

Descriptive Statistics

Mean 4.6532
Standard Deviation 14.4826
Variance 209.7449
Sum 55475
Skewness 15.213
Kurtosis 347.305
Coefficient of Variation 3.1124
  • minimum_nights is not normally distributed (p-value 4.923235149188921e-25)
  • minimum_nights has 1175 outliers

number_of_reviews

numerical

Approximate Distinct Count 336
Approximate Unique (%) 2.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 28.9004
Minimum 0
Maximum 686
Zeros 2153
Zeros (%) 18.1%
Negatives 0
Negatives (%) 0.0%
  • number_of_reviews is skewed right (γ1 = 3.6357)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 1
Median 8
Q3 33
95-th Percentile 129
Maximum 686
Range 686
IQR 32

Descriptive Statistics

Mean 28.9004
Standard Deviation 51.4705
Variance 2649.2172
Sum 344551
Skewness 3.6357
Kurtosis 19.0926
Coefficient of Variation 1.781
  • number_of_reviews is not normally distributed (p-value 1.4130936840563656e-23)
  • number_of_reviews has 1221 outliers

last_review

categorical

Approximate Distinct Count 890
Approximate Unique (%) 9.1%
Missing 2160
Missing (%) 18.1%
Memory Size 708.0 KB

Length

Mean 9.2654
Standard Deviation 0.6548
Median 9
Minimum 8
Maximum 10

Sample

1st row 10/7/2018
2nd row 11/2/2015
3rd row 11/7/2018
4th row 7/28/2014
5th row 10/28/2018

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 70925

reviews_per_month

numerical

Approximate Distinct Count 865
Approximate Unique (%) 8.9%
Missing 2163
Missing (%) 18.1%
Infinite 0
Infinite (%) 0.0%
Memory Size 152.5 KB
Mean 1.9358
Minimum 0.01
Maximum 18.95
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • reviews_per_month is skewed right (γ1 = 1.5825)

Quantile Statistics

Minimum 0.01
5-th Percentile 0.07
Q1 0.36
Median 1.14
Q3 2.95
95-th Percentile 6.022
Maximum 18.95
Range 18.94
IQR 2.59

Descriptive Statistics

Mean 1.9358
Standard Deviation 2.0563
Variance 4.2283
Sum 18891.05
Skewness 1.5825
Kurtosis 2.9502
Coefficient of Variation 1.0623
  • reviews_per_month is not normally distributed (p-value 2.9518931730539836e-16)
  • reviews_per_month has 306 outliers

calculated_host_listings_count

numerical

Approximate Distinct Count 40
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 10.3261
Minimum 1
Maximum 161
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • calculated_host_listings_count is skewed right (γ1 = 4.2046)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 2
Q3 5
95-th Percentile 55
Maximum 161
Range 160
IQR 4

Descriptive Statistics

Mean 10.3261
Standard Deviation 26.208
Variance 686.8566
Sum 123108
Skewness 4.2046
Kurtosis 18.514
Coefficient of Variation 2.538
  • calculated_host_listings_count is not normally distributed (p-value 8.080369304642773e-25)
  • calculated_host_listings_count has 1893 outliers

availability_365

numerical

Approximate Distinct Count 366
Approximate Unique (%) 3.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 186.3 KB
Mean 152.4864
Minimum 0
Maximum 365
Zeros 2356
Zeros (%) 19.8%
Negatives 0
Negatives (%) 0.0%
  • availability_365 is skewed right (γ1 = 0.3187)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 22
Median 132
Q3 287
95-th Percentile 361
Maximum 365
Range 365
IQR 265

Descriptive Statistics

Mean 152.4864
Standard Deviation 131.0392
Variance 17171.2633
Sum 1.8179e+06
Skewness 0.3187
Kurtosis -1.3997
Coefficient of Variation 0.8593
  • availability_365 is not normally distributed (p-value 1.8515959975118897e-22)

Interactions

Correlations

Missing Values